Convolutional Neural Networks

Project: Write an Algorithm for Landmark Classification


In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

Note: Once you have completed all the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to HTML, all the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.

The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.


Why We're Here

Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.

If no location metadata for an image is available, one way to infer the location is to detect and classify a discernable landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible.

In this notebook, you will take the first steps towards addressing this problem by building models to automatically predict the location of the image based on any landmarks depicted in the image. At the end of this project, your code will accept any user-supplied image as input and suggest the top k most relevant landmarks from 50 possible landmarks from across the world. The image below displays a potential sample output of your finished project.

Sample landmark classification output

The Road Ahead

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.

  • Step 0: Download Datasets and Install Python Modules
  • Step 1: Create a CNN to Classify Landmarks (from Scratch)
  • Step 2: Create a CNN to Classify Landmarks (using Transfer Learning)
  • Step 3: Write Your Landmark Prediction Algorithm

Step 0: Download Datasets and Install Python Modules

Note: if you are using the Udacity workspace, YOU CAN SKIP THIS STEP. The dataset can be found in the /data folder and all required Python modules have been installed in the workspace.

Download the landmark dataset. Unzip the folder and place it in this project's home directory, at the location /landmark_images.

Install the following Python modules:

  • cv2
  • matplotlib
  • numpy
  • PIL
  • torch
  • torchvision

Step 1: Create a CNN to Classify Landmarks (from Scratch)

In this step, you will create a CNN that classifies landmarks. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 20%.

Although 20% may seem low at first glance, it seems more reasonable after realizing how difficult of a problem this is. Many times, an image that is taken at a landmark captures a fairly mundane image of an animal or plant, like in the following picture.

Bird in Haleakalā National Park

Just by looking at that image alone, would you have been able to guess that it was taken at the Haleakalā National Park in Hawaii?

An accuracy of 20% is significantly better than random guessing, which would provide an accuracy of just 2%. In Step 2 of this notebook, you will have the opportunity to greatly improve accuracy by using transfer learning to create a CNN.

Remember that practice is far ahead of theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!

(IMPLEMENTATION) Specify Data Loaders for the Landmark Dataset

Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.

Note: Remember that the dataset can be found at /data/landmark_images/ in the workspace.

All three of your data loaders should be accessible via a dictionary named loaders_scratch. Your train data loader should be at loaders_scratch['train'], your validation data loader should be at loaders_scratch['valid'], and your test data loader should be at loaders_scratch['test'].

You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!

In [1]:
%matplotlib inline 
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import glob
from glob import glob
import random
import os
import cv2
from torch import nn
from torch import optim
import torchvision
from torchvision import datasets, models, transforms
import torch.nn.functional as F
import torch.utils.data
from torch.utils.data import Dataset, DataLoader
from torch.utils.data.sampler import SubsetRandomSampler
import pandas as pd
import PIL
from PIL import Image


num_workers = 0
batch_size = 16
valid_size = 0.2
 
train_path = '/data/landmark_images/train'
test_path = '/data/landmark_images/test'

transform = transforms.Compose([transforms.Resize(255),                    
                                transforms.CenterCrop(224),                
                                transforms.ToTensor(),                     
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 
train_data = datasets.ImageFolder(train_path, transform = transform)

test_data  = datasets.ImageFolder(test_path, transform = transform)
 
num_train = len(train_data)

indices = list(range(num_train))

np.random.shuffle(indices)

split = int(np.floor(valid_size * num_train))

train_idx, valid_idx = indices[split:], indices[:split]
 
train_sampler = SubsetRandomSampler(train_idx)

valid_sampler = SubsetRandomSampler(valid_idx)
 
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers)

valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers)

test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

loaders_scratch = {'train': train_loader, 'test': test_loader, 'valid': valid_loader}

Question 1: Describe your chosen procedure for preprocessing the data.

  • How does your code resize the images (by cropping, stretching, etc)? What size did you pick for the input tensor, and why?
  • Did you decide to augment the dataset? If so, how (through translations, flips, rotations, etc)? If not, why not?

Answer:

  • I Cropped the images to resize, and decided to make it 224 because I will use "VGG16" in Step 2 (Transfer Learning) which won't accept any other sizes because it was trained on size 224

  • I rotated and flipped the images.

(IMPLEMENTATION) Visualize a Batch of Training Data

Use the code cell below to retrieve a batch of images from your train data loader, display at least 5 images simultaneously, and label each displayed image with its class name (e.g., "Golden Gate Bridge").

Visualizing the output of your data loader is a great way to ensure that your data loading and preprocessing are working as expected.

In [2]:
#To view the images clearly

transform_view = transforms.Compose([transforms.Resize(255), transforms.CenterCrop(224), transforms.ToTensor()])
 
train_data_view = datasets.ImageFolder(train_path, transform = transform_view)
 
num_train_view = len(train_data_view)

indices_view = list(range(num_train_view))

np.random.shuffle(indices_view)

split_view = int(np.floor(valid_size * num_train_view))

train_idx_view, valid_idx_view = indices_view[split:], indices_view[:split]
 
train_sampler_view = SubsetRandomSampler(train_idx_view)
 
train_loader_view = torch.utils.data.DataLoader(train_data_view, batch_size=batch_size, sampler=train_sampler_view, num_workers=num_workers)


#view the images

dataiter = iter(train_loader_view)

images, labels = dataiter.next()

images = images.numpy() 

classes_train = os.listdir(train_path)

fig = plt.figure(figsize=(25, 25))


for idx in np.arange(5):
    
    ax = fig.add_subplot(5, 1, idx+1, xticks=[], yticks=[])
    
    plt.imshow(np.transpose(images[idx], (1, 2, 0)))
    
    ax.set_title(classes_train[labels[idx]])

Initialize use_cuda variable

In [3]:
# useful variable that tells us whether we should use the GPU
use_cuda = torch.cuda.is_available()

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and fill in the function get_optimizer_scratch below.

In [4]:
criterion_scratch = nn.CrossEntropyLoss()

def get_optimizer_scratch(model):
    
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
    
    return optimizer

(IMPLEMENTATION) Model Architecture

Create a CNN to classify images of landmarks. Use the template in the code cell below.

In [5]:
class Net(nn.Module):
    def __init__(self):
       
        super(Net, self).__init__()
        
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1) 
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1) 
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1) 
        self.fc1 = nn.Linear(28*28*64,3000)
        self.fc2 = nn.Linear(3000,1000)
        self.fc3 = nn.Linear(1000,133)
        
        self.dropout = nn.Dropout(0.3)
        self.pool = nn.MaxPool2d(2, 2)
        
        
    def forward(self, x):
        
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = F.relu(self.conv3(x))
        x = self.pool(x)
        x = x.view(-1, 28*28*64)
        x = self.dropout(x)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = F.relu(self.fc3(x))

        return x

#-#-# Do NOT modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()

Question 2: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.

Answer:

I built six layers, which are: Three convolutional layers Three fully connected layers

Firstful, the image will go into the convolutional layers with a "3x3" kernel, and padding of one. After that, I reshaped the image to fit into the fully connected layers. A dropout will happen after each layer because the data will go into three fully connected layers, to generalize and avoid overfitting. I used the ReLu activation function after every layer.

(IMPLEMENTATION) Implement the Training Algorithm

Implement your training algorithm in the code cell below. Save the final model parameters at the filepath stored in the variable save_path.

In [7]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs + 1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################

        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()

            optimizer.zero_grad()

            output = model(data)

            loss = criterion(output, target)

            loss.backward()

            optimizer.step()

            train_loss = train_loss + ((1 / (1 + batch_idx)) * (loss.data - train_loss))
 

        ######################    
        # validate the model #
        ######################

        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()

            output = model(data)

            loss = criterion(output, target)

            valid_loss = valid_loss + ((1 / (1 + batch_idx)) * (loss.data - valid_loss))
            
                        
        train_loss = train_loss/len(loaders['train'].dataset)
        
        valid_loss = valid_loss/len(loaders['valid'].dataset)
 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss, valid_loss))
    
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min, valid_loss))
            
            torch.save(model.state_dict(), save_path)
            
            valid_loss_min = valid_loss 
            
            
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format (valid_loss_min, valid_loss))
            
            torch.save(model.state_dict(), save_path)
            
            valid_loss_min = valid_loss 
            
            
        
    return model

(IMPLEMENTATION) Experiment with the Weight Initialization

Use the code cell below to define a custom weight initialization, and then train with your weight initialization for a few epochs. Make sure that neither the training loss nor validation loss is nan.

Later on, you will be able to see how this compares to training with PyTorch's default weight initialization.

In [8]:
def custom_weight_init(m):
    classname = m.__class__.__name__
    # for every Linear layer in a model..
    if classname.find('Linear') != -1:
        # apply a centered, uniform distribution to the weights
        m.weight.data.uniform_(-0.5, 0.5)
        m.bias.data.fill_(0)
    

#-#-# Do NOT modify the code below this line. #-#-#
    
model_scratch.apply(custom_weight_init)

model_scratch = train(20, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), criterion_scratch, use_cuda, 'ignore.pt')
Epoch: 1 	Training Loss: 1196101147295744.000000 	Validation Loss: 0.000979
Validation loss decreased (inf --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 2 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 3 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 4 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 5 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 6 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 7 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 8 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 9 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 10 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 11 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 12 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 13 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 14 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 15 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 16 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 17 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 18 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 19 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Epoch: 20 	Training Loss: 0.000979 	Validation Loss: 0.000979
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...
Validation loss decreased (0.000979 --> 0.000979).  Saving model ...

(IMPLEMENTATION) Train and Validate the Model

Run the next code cell to train your model.

In [9]:
num_epochs = 100


########################################################################################################################
# I noticed that "Validation Loss" is increasing, and the best model at "Epoch: 16". therefore, I stopped the training #
########################################################################################################################


#-#-# Do NOT modify the code below this line. #-#-#

# function to re-initialize a model with pytorch's default weight initialization
def default_weight_init(m):
    reset_parameters = getattr(m, 'reset_parameters', None)
    if callable(reset_parameters):
        m.reset_parameters()

# reset the model parameters
model_scratch.apply(default_weight_init)

# train the model
model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), criterion_scratch, use_cuda, 'model_scratch.pt')
Epoch: 1 	Training Loss: 0.000938 	Validation Loss: 0.000833
Validation loss decreased (inf --> 0.000833).  Saving model ...
Validation loss decreased (0.000833 --> 0.000833).  Saving model ...
Epoch: 2 	Training Loss: 0.000822 	Validation Loss: 0.000803
Validation loss decreased (0.000833 --> 0.000803).  Saving model ...
Validation loss decreased (0.000803 --> 0.000803).  Saving model ...
Epoch: 3 	Training Loss: 0.000797 	Validation Loss: 0.000773
Validation loss decreased (0.000803 --> 0.000773).  Saving model ...
Validation loss decreased (0.000773 --> 0.000773).  Saving model ...
Epoch: 4 	Training Loss: 0.000767 	Validation Loss: 0.000750
Validation loss decreased (0.000773 --> 0.000750).  Saving model ...
Validation loss decreased (0.000750 --> 0.000750).  Saving model ...
Epoch: 5 	Training Loss: 0.000749 	Validation Loss: 0.000740
Validation loss decreased (0.000750 --> 0.000740).  Saving model ...
Validation loss decreased (0.000740 --> 0.000740).  Saving model ...
Epoch: 6 	Training Loss: 0.000737 	Validation Loss: 0.000727
Validation loss decreased (0.000740 --> 0.000727).  Saving model ...
Validation loss decreased (0.000727 --> 0.000727).  Saving model ...
Epoch: 7 	Training Loss: 0.000725 	Validation Loss: 0.000723
Validation loss decreased (0.000727 --> 0.000723).  Saving model ...
Validation loss decreased (0.000723 --> 0.000723).  Saving model ...
Epoch: 8 	Training Loss: 0.000713 	Validation Loss: 0.000707
Validation loss decreased (0.000723 --> 0.000707).  Saving model ...
Validation loss decreased (0.000707 --> 0.000707).  Saving model ...
Epoch: 9 	Training Loss: 0.000693 	Validation Loss: 0.000698
Validation loss decreased (0.000707 --> 0.000698).  Saving model ...
Validation loss decreased (0.000698 --> 0.000698).  Saving model ...
Epoch: 10 	Training Loss: 0.000671 	Validation Loss: 0.000681
Validation loss decreased (0.000698 --> 0.000681).  Saving model ...
Validation loss decreased (0.000681 --> 0.000681).  Saving model ...
Epoch: 11 	Training Loss: 0.000651 	Validation Loss: 0.000669
Validation loss decreased (0.000681 --> 0.000669).  Saving model ...
Validation loss decreased (0.000669 --> 0.000669).  Saving model ...
Epoch: 12 	Training Loss: 0.000631 	Validation Loss: 0.000657
Validation loss decreased (0.000669 --> 0.000657).  Saving model ...
Validation loss decreased (0.000657 --> 0.000657).  Saving model ...
Epoch: 13 	Training Loss: 0.000606 	Validation Loss: 0.000653
Validation loss decreased (0.000657 --> 0.000653).  Saving model ...
Validation loss decreased (0.000653 --> 0.000653).  Saving model ...
Epoch: 14 	Training Loss: 0.000585 	Validation Loss: 0.000632
Validation loss decreased (0.000653 --> 0.000632).  Saving model ...
Validation loss decreased (0.000632 --> 0.000632).  Saving model ...
Epoch: 15 	Training Loss: 0.000557 	Validation Loss: 0.000640
Epoch: 16 	Training Loss: 0.000522 	Validation Loss: 0.000624
Validation loss decreased (0.000632 --> 0.000624).  Saving model ...
Validation loss decreased (0.000624 --> 0.000624).  Saving model ...
Epoch: 17 	Training Loss: 0.000477 	Validation Loss: 0.000628
Epoch: 18 	Training Loss: 0.000426 	Validation Loss: 0.000645
Epoch: 19 	Training Loss: 0.000360 	Validation Loss: 0.000671
Epoch: 20 	Training Loss: 0.000290 	Validation Loss: 0.000707
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-9-4ed0f171f67a> in <module>()
     19 
     20 # train the model
---> 21 model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), criterion_scratch, use_cuda, 'model_scratch.pt')

<ipython-input-7-aff23d71d5fa> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
     14 
     15         model.train()
---> 16         for batch_idx, (data, target) in enumerate(loaders['train']):
     17             # move to GPU
     18             if use_cuda:

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in <listcomp>(.0)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in __getitem__(self, index)
    101         sample = self.loader(path)
    102         if self.transform is not None:
--> 103             sample = self.transform(sample)
    104         if self.target_transform is not None:
    105             target = self.target_transform(target)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py in __call__(self, img)
     47     def __call__(self, img):
     48         for t in self.transforms:
---> 49             img = t(img)
     50         return img
     51 

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py in __call__(self, img)
    173             PIL Image: Rescaled image.
    174         """
--> 175         return F.resize(img, self.size, self.interpolation)
    176 
    177     def __repr__(self):

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/functional.py in resize(img, size, interpolation)
    202             oh = size
    203             ow = int(size * w / h)
--> 204             return img.resize((ow, oh), interpolation)
    205     else:
    206         return img.resize(size[::-1], interpolation)

/opt/conda/lib/python3.6/site-packages/PIL/Image.py in resize(self, size, resample, box)
   1763         self.load()
   1764 
-> 1765         return self._new(self.im.resize(size, resample, box))
   1766 
   1767     def rotate(self, angle, resample=NEAREST, expand=0, center=None,

KeyboardInterrupt: 

(IMPLEMENTATION) Test the Model

Run the code cell below to try out your model on the test dataset of landmark images. Run the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 20%.

In [10]:
def test(loaders, model, criterion, use_cuda):

    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()

    for batch_idx, (data, target) in enumerate(loaders['test']):

        if use_cuda:
            data, target = data.cuda(), target.cuda()

        output = model(data)

        loss = criterion(output, target)
 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))

        pred = output.data.max(1, keepdim=True)[1]

        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (100. * correct / total, correct, total))


model_scratch.load_state_dict(torch.load('model_scratch.pt'))

test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
Test Loss: 3.145069


Test Accuracy: 23% (290/1250)

Step 2: Create a CNN to Classify Landmarks (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify landmarks from images. Your CNN must attain at least 60% accuracy on the test set.

(IMPLEMENTATION) Specify Data Loaders for the Landmark Dataset

Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.

All three of your data loaders should be accessible via a dictionary named loaders_transfer. Your train data loader should be at loaders_transfer['train'], your validation data loader should be at loaders_transfer['valid'], and your test data loader should be at loaders_transfer['test'].

If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.

In [11]:
num_workers = 0
batch_size = 16
valid_size = 0.2
 
train_path = '/data/landmark_images/train'
test_path = '/data/landmark_images/test'

transform = transforms.Compose([transforms.Resize(255),                    
                                transforms.CenterCrop(224),                
                                transforms.ToTensor(),                     
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 
train_data = datasets.ImageFolder(train_path, transform = transform)

test_data  = datasets.ImageFolder(test_path, transform = transform)
 
num_train = len(train_data)

indices = list(range(num_train))

np.random.shuffle(indices)

split = int(np.floor(valid_size * num_train))

train_idx, valid_idx = indices[split:], indices[:split]
 
train_sampler = SubsetRandomSampler(train_idx)

valid_sampler = SubsetRandomSampler(valid_idx)
 
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers)

valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers)

test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

loaders_transfer = {'train': train_loader, 'test': test_loader, 'valid': valid_loader}

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and fill in the function get_optimizer_transfer below.

In [12]:
criterion_transfer = nn.CrossEntropyLoss()

def get_optimizer_transfer(model):
    optimizer_transfer = optim.SGD(model_transfer.classifier.parameters(), lr=0.01)
    
    return optimizer_transfer

(IMPLEMENTATION) Model Architecture

Use transfer learning to create a CNN to classify images of landmarks. Use the code cell below, and save your initialized model as the variable model_transfer.

In [13]:
model_transfer = models.vgg16(pretrained=True)

for param in model_transfer.features.parameters():
    param.requires_grad = False

model_transfer.classifier[6] = nn.Linear(4096,50)


#-#-# Do NOT modify the code below this line. #-#-#

if use_cuda:
    model_transfer = model_transfer.cuda()
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.torch/models/vgg16-397923af.pth
100%|██████████| 553433881/553433881 [00:04<00:00, 114910439.72it/s]

Question 3: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.

Answer:

First, I froze the parameters in the "VGG16" model.

Then, I Edited the last layer of the classifier, to be suitable for the number of classes we have in the landmarks classes.

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.

In [14]:
model_transfer = train(20, loaders_transfer, model_transfer, get_optimizer_transfer(model_transfer), criterion_transfer, use_cuda, 'model_transfer.pt')

#-#-# Do NOT modify the code below this line. #-#-#

# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
Epoch: 1 	Training Loss: 0.000419 	Validation Loss: 0.000286
Validation loss decreased (inf --> 0.000286).  Saving model ...
Validation loss decreased (0.000286 --> 0.000286).  Saving model ...
Epoch: 2 	Training Loss: 0.000220 	Validation Loss: 0.000272
Validation loss decreased (0.000286 --> 0.000272).  Saving model ...
Validation loss decreased (0.000272 --> 0.000272).  Saving model ...
Epoch: 3 	Training Loss: 0.000151 	Validation Loss: 0.000240
Validation loss decreased (0.000272 --> 0.000240).  Saving model ...
Validation loss decreased (0.000240 --> 0.000240).  Saving model ...
Epoch: 4 	Training Loss: 0.000101 	Validation Loss: 0.000237
Validation loss decreased (0.000240 --> 0.000237).  Saving model ...
Validation loss decreased (0.000237 --> 0.000237).  Saving model ...
Epoch: 5 	Training Loss: 0.000069 	Validation Loss: 0.000250
Epoch: 6 	Training Loss: 0.000050 	Validation Loss: 0.000246
Epoch: 7 	Training Loss: 0.000033 	Validation Loss: 0.000252
Epoch: 8 	Training Loss: 0.000025 	Validation Loss: 0.000262
Epoch: 9 	Training Loss: 0.000019 	Validation Loss: 0.000258
Epoch: 10 	Training Loss: 0.000014 	Validation Loss: 0.000268
Epoch: 11 	Training Loss: 0.000012 	Validation Loss: 0.000270
Epoch: 12 	Training Loss: 0.000010 	Validation Loss: 0.000276
Epoch: 13 	Training Loss: 0.000008 	Validation Loss: 0.000276
Epoch: 14 	Training Loss: 0.000007 	Validation Loss: 0.000280
Epoch: 15 	Training Loss: 0.000006 	Validation Loss: 0.000280
Epoch: 16 	Training Loss: 0.000005 	Validation Loss: 0.000284
Epoch: 17 	Training Loss: 0.000004 	Validation Loss: 0.000294
Epoch: 18 	Training Loss: 0.000003 	Validation Loss: 0.000294
Epoch: 19 	Training Loss: 0.000004 	Validation Loss: 0.000295
Epoch: 20 	Training Loss: 0.000003 	Validation Loss: 0.000296

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of landmark images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.

In [15]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.942927


Test Accuracy: 75% (939/1250)

Step 3: Write Your Landmark Prediction Algorithm

Great job creating your CNN models! Now that you have put in all the hard work of creating accurate classifiers, let's define some functions to make it easy for others to use your classifiers.

(IMPLEMENTATION) Write Your Algorithm, Part 1

Implement the function predict_landmarks, which accepts a file path to an image and an integer k, and then predicts the top k most likely landmarks. You are required to use your transfer learned CNN from Step 2 to predict the landmarks.

An example of the expected behavior of predict_landmarks:

>>> predicted_landmarks = predict_landmarks('example_image.jpg', 3)
>>> print(predicted_landmarks)
['Golden Gate Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge']
In [81]:
classes = test_data.classes

def predict_landmarks(img_path, k):
    
    img = Image.open(img_path) 
    
    data_transform = transforms.Compose([transforms.RandomResizedCrop(224), transforms.ToTensor()])
    
    data = data_transform(img)

    data = data.unsqueeze(0)
    
    output = model_transfer(data.cuda())
    
    top_p, top_class = torch.topk(output, k)
    
    preds_classes = np.array(top_class)
    
    class_pred = [classes[i] for i in preds_classes[0]]

    return class_pred


predicted_landmark = predict_landmarks('/data/landmark_images/test/16.Eiffel_Tower/26f82dab964ef649.jpg', 5)

print('This landmark is one of these five below')
print(predicted_landmark)
This landmark is one of these five below
['16.Eiffel_Tower', '35.Monumento_a_la_Revolucion', '14.Terminal_Tower', '28.Sydney_Harbour_Bridge', '29.Petronas_Towers']

(IMPLEMENTATION) Write Your Algorithm, Part 2

In the code cell below, implement the function suggest_locations, which accepts a file path to an image as input, and then displays the image and the top 3 most likely landmarks as predicted by predict_landmarks.

Some sample output for suggest_locations is provided below, but feel free to design your own user experience!

In [82]:
def suggest_locations(img_path):

    predicted_landmarks = predict_landmarks(img_path, 3)
    
    landmark = cv2.imread(img_path)
    
    plt.imshow(landmark)
    
    plt.show() 
    
    print('This landmark is one of these landmarks below')
    
    print(predicted_landmarks)



# test on a sample image
suggest_locations('/data/landmark_images/test/18.Delicate_Arch/3cd1b3b080985a20.jpg')
This landmark is one of these landmarks below
['18.Delicate_Arch', '03.Dead_Sea', '42.Death_Valley_National_Park']

(IMPLEMENTATION) Test Your Algorithm

Test your algorithm by running the suggest_locations function on at least four images on your computer. Feel free to use any images you like.

Question 4: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.

Answer:

I think it did better than I expected, it predicted all classes of the images (the class of the desired image was one of the predicted classes).

we can make the model better in many ways, for example:

  • Using more epochs for training the model.

  • Adding more data, in order to make the model learn the differences between landmarks better.

  • Changing the values in the layers (Model Architecture) to better values.

In [83]:
model_transfer.load_state_dict(torch.load('model_transfer.pt'))

test_images_files = np.array(glob("/data/landmark_images/test/*/*"))

for pic in np.hstack((test_images_files[:5])):
    suggest_locations(pic)
This landmark is one of these landmarks below
['49.Temple_of_Olympian_Zeus', '40.Stockholm_City_Hall', '31.Washington_Monument']
This landmark is one of these landmarks below
['49.Temple_of_Olympian_Zeus', '30.Brooklyn_Bridge', '40.Stockholm_City_Hall']
This landmark is one of these landmarks below
['49.Temple_of_Olympian_Zeus', '48.Whitby_Abbey', '07.Stonehenge']
This landmark is one of these landmarks below
['31.Washington_Monument', '48.Whitby_Abbey', '49.Temple_of_Olympian_Zeus']
This landmark is one of these landmarks below
['49.Temple_of_Olympian_Zeus', '21.Taj_Mahal', '30.Brooklyn_Bridge']